April 3, 2010 at 4:02 pm
I'm in an OLTP environment and am having issues with table contention. Some of this stems from a situation where we have a table with let's say 50K records in it. We'll say that the data distribution is 500-1000 records per customer. There is code peppered all over the database to update all the records for this customer for fields such as "Balance". This update occurs within a trigger and thus inside an implicit transaction. Meanwhile, other people are trying to select just the customer name/address and don't care about "Balance". What I'm looking for is an option that would allow the people selecting customer name/address to not have to wait on the trigger to finish before it will allow them to go through.
Now for the kicker. I can't update the procedures used for any of this, so I need something structure wise. I tried an indexed view which excluded the updated columns (Balance), but that doesn't seem to do any good. (It also waits on an open transaction that updates the balance column unless I specify the view in the FROM with the NO EXPAND hint).
Anyone have any ideas for anything else I could try?
Updates:
Things to Try:
Separate FileGroups on the same Array
Questionable benefit
If there is a benefit it will likely be minor... 5%'ish.
Index Changes
Look at the indexes involved, make sure indexes are being used for the updates.
Try to eliminate any hotspots.
Vertical Partitioning
Split the table and create a view with the old table name.
Ideally the 2 separate physical tables will alleviate contention for reads of non-updated fields.
Indexed Views
Very unlikely to have any beneficial effect due to the simplicity of the plans involved.
May actually have a detrimental effect because it is one more thing to keep up to date.
Isolation Level Changes
NOLOCK Hints or Transaction Isolation level Changes.
Not possible to do this everywhere, but it may help alleviate the strain from the more commonly run procedures.
April 3, 2010 at 4:13 pm
Chaniging the isolation level? Move that table to faster disks? Not sure what you can do here, Seth.
April 3, 2010 at 4:33 pm
Steve Jones - Editor (4/3/2010)
Chaniging the isolation level? Move that table to faster disks? Not sure what you can do here, Seth.
Hrm, the faster disks is an option to explore. I'll have to see if I can figure out a way to implement that. Changing the isolation level is also something. We've already got more NOLOCKS on our database than I can shake a stick at.
April 4, 2010 at 12:37 am
I second what Steve Jones said.
Then again, without more information any suggestions may be as useless. I know my first thought sounded great until I read your "cannot" change stored procedures; I imagine this also applies to Triggers.
Things to think about. How close to realtime does the queried for data need to match the in-flight data? If a reasonable delay is acceptable, then I would suggest setting up a Transactional Replication where the subscriber database (table or tables) is considered Read-Only. If you must, you can set up a Linked-Server connection from the current database instance and reference that through a new stored procedure or (I think) view; I would lean towards a stored procedure.
Take a look at the Trigger at make sure it is not running a cursor or some other activity like referencing a remote table via a Linked-Server (performance killer in transactions where OLTP is concerned).
I know of more than one situation where enabling Read-Committed Snapshots cut transaction-induced blocking, but you need to understand the limitations and impact of Read-Committed Snapshots and monitor.
Run Actual Query Plans for the update(s) and make sure you are performing in-place updates and not moving rows due to changing values. Check your index(es) for the table and see how many are fragmented and if they need their statistics updated.
Again, without knowing what your limitations are and what has already been done, relevant suggestions are nearly impossible.
April 4, 2010 at 1:01 am
Robert L Miller (4/4/2010)
Take a look at the Trigger at make sure it is not running a cursor or some other activity like referencing a remote table via a Linked-Server (performance killer in transactions where OLTP is concerned).
:hehe: The problem is, I KNOW they're running cursors and that is definitely part of the problem, but I really can't change it. (Way too much code involved in modifying them and it's not mine, it's a third party vendor's)
Changing code isn't really an option, nor unfortunately is transactional replication. The reads and the writes are all coming from the same front end ERP system. I can probably change code in a few places, but it's too widespread, and in many place generated by compiled middle layers that I don't even have access to.
Indexing is definitely a valid thing to look into, it's high on my priority list.
Still not sure how compression is going to affect this.
One of my thoughts is to take the table, vertically partition it into 2 tables with new names and ten create a view with the old name in the hopes that the mass updates will only lock that partition and not hold up reads. Highly unsure about that idea tho, and I might end up having to modify or re-write the triggers.
April 4, 2010 at 7:45 am
Continuing the theme Robert Miller started, if data readers can tolerate reading older data, also consider creating a database snapshot to implement the read-only access. This can work well for reporting requirements. Log shipping (with the database in Standby mode) is another option. A database snapshot over a mirror works well too.
Implementing a row-versioning isolation level is also a possible solution (you are already making use of the version store with all those triggers) but this might not work as well as all that, since the READCOMMITTEDLOCK hint is generally required in triggers to avoid incorrect behaviour. This hint returns us to default isolation level behaviour - shared locks are issued for reads, and versioned rows are not read.
Implementing RCS may therefore not solve the problem completely, and might introduce hard-to-trace bugs. Full SNAPSHOT isolation would require extensive code changes, and update conflicts can make things quite complex - and I am not sure you need that right now 😉
April 4, 2010 at 1:06 pm
Unfortunately this isn't reporting or I'd do just that. The access is done from a front end forms module. The same forms that pull up data do the updating, so I can't have 2 different databases. At any given time any given user can just be looking at data or update it all in the same place.
I didn't really expect there to be anything that can fulfill the ridiculous requirements of fixing the symptoms without curing the problem, but I figured I'd throw it out there anyways in case I was wrong.
Originally, I was hoping that the ability of the enterprise edition optimizer to expand to indexed views even when the base table is specified could somehow circumvent the locking, but it doesn't... unless I just didn't implement it properly.
April 4, 2010 at 1:09 pm
(Way too much code involved in modifying them and it's not mine, it's a third party vendor's)
My first inclination is to invite the officers of the third party vendor's to a huge pork chop dinner and remind them that word-of-mouth from an unhappy customer working with an uncooperative vendor travels faster than lightning. If they're the only show in town and they refuse, then it may be time to convince the managers you work for that a rewrite of certain pieces of the code by in-house folks would be well worth the effort and very well worth violating (and, therefor, negating) the supposed useless support agreement (if there even was one).
It's not worth having a support agreement that doesn't have a "performance" SLA. We went through that bit of insanity at one of my old jobs. A bit of dupe check code was designed to check all 31 billing cycle days for 3 months back (93 databases of 4 million rows each). It ran so slow that we could only check 2 months back which, of course, is an accident waiting to happen since the 3 month requirement came from the PUC of most States in the U.S. The vendor shirked us off by saying we were the only ones with the problem (they were right... we were also the only ones with as many customers... everyone else could have used a spreadsheet for their stuff). The code would run for 10 to 24 hours to sometimes (usually, actually) fail.
When I got done with it, it would do a full 3 months worth of checks, ran in 11 minutes instead of 10 to 24 hours, and hasn't failed, yet. As a side bar, we didn't tell the vendor... we just stopped using that aspect of their code and it seems that because of the way the vendor had written the support agreement, we didn't actually violate the support agreement.
If you can first document the problem and then demonstrate a fix and the performance/scalability gains on a parallel system (like I did), your management will listen and your vendor may suddenly be happy to listen. Our vendor didn't... they seemed more comfortable with taking a serious black eye because even though the company still has to use parts of the original code, it's being replaced one module at a time and, I can assure you, THAT third party vendor has lost a lot of new customers based on the word-of-mouth done in elevators and on Golf courses.
Give the 3rd party vendor the opportunity to fail (which is the only way someone can actually succeed)... seriously have your managment sit down with their management with a couple of "system experts" from both sides and try to hammer out the problems. If that doesn't work, begin the replacement process because, even if they're sole source, they're just not worth having if they don't listen and act on your problems.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 4, 2010 at 1:36 pm
Garadin (4/4/2010)
Unfortunately this isn't reporting or I'd do just that.
Ok. And RCS is a no-no because you can't modify the triggers, is that right?
Enterprise can be reasonably good at automatically matching indexed views to plans, but it will never choose an indexed view for locking reasons - only estimated cost. The main reasons for the QO not choosing an indexed view are:
1. It finds an 'good enough' plan using non-indexed views first. Indexed views are only considered quite late on in the optimization process - so many queries never make it that far.
2. No statistics are available on the indexed view. This is a frequent cause since automatic statistics are *never* created on indexed views - you have to do it manually.
April 5, 2010 at 7:46 am
Jeff Moden (4/4/2010)
(Way too much code involved in modifying them and it's not mine, it's a third party vendor's)
My first inclination is to invite the officers of the third party vendor's to a huge pork chop dinner and remind them that word-of-mouth from an unhappy customer working with an uncooperative vendor travels faster than lightning. If they're the only show in town and they refuse, then it may be time to convince the managers you work for that a rewrite of certain pieces of the code by in-house folks would be well worth the effort and very well worth violating (and, therefor, negating) the supposed useless support agreement (if there even was one).
It's not worth having a support agreement that doesn't have a "performance" SLA. We went through that bit of insanity at one of my old jobs. A bit of dupe check code was designed to check all 31 billing cycle days for 3 months back (93 databases of 4 million rows each). It ran so slow that we could only check 2 months back which, of course, is an accident waiting to happen since the 3 month requirement came from the PUC of most States in the U.S. The vendor shirked us off by saying we were the only ones with the problem (they were right... we were also the only ones with as many customers... everyone else could have used a spreadsheet for their stuff). The code would run for 10 to 24 hours to sometimes (usually, actually) fail.
When I got done with it, it would do a full 3 months worth of checks, ran in 11 minutes instead of 10 to 24 hours, and hasn't failed, yet. As a side bar, we didn't tell the vendor... we just stopped using that aspect of their code and it seems that because of the way the vendor had written the support agreement, we didn't actually violate the support agreement.
If you can first document the problem and then demonstrate a fix and the performance/scalability gains on a parallel system (like I did), your management will listen and your vendor may suddenly be happy to listen. Our vendor didn't... they seemed more comfortable with taking a serious black eye because even though the company still has to use parts of the original code, it's being replaced one module at a time and, I can assure you, THAT third party vendor has lost a lot of new customers based on the word-of-mouth done in elevators and on Golf courses.
Give the 3rd party vendor the opportunity to fail (which is the only way someone can actually succeed)... seriously have your managment sit down with their management with a couple of "system experts" from both sides and try to hammer out the problems. If that doesn't work, begin the replacement process because, even if they're sole source, they're just not worth having if they don't listen and act on your problems.
I agree with all of this, but in this situation it's a bit of a pickle. The problem isn't that we aren't allowed to modify things, we are... and we do constantly to keep the system running. We even send them our mods and they either implement them into their repository for a specific version for us or they implement it in their standard product depending on the type of change. The issue is the scope. The more problematic triggers are 3-4 thousand lines of code by themselves and they nest 10+ levels deep into 25-30 thousand lines of code among all the sp's/functions/table updates that run other triggers etc. that they call. Within that code are several hundred IF statements testing specific fields for values and taking completely different paths. Re-writing even one of these triggers to not use cursors etc. is an undertaking of weeks if not months of effort. It's not an issue of admitting there is a problem, it's an issue of allotting the time to resolve it. Everything is so spiderwebbed out and touches so many different things (in addition to the middle layer which we don't have access to and can't properly take into account) that it's extremely difficult to modify things. We're moving to 2008 on the 16th (finally) and I'd like to have something in place by then. That said, we can do the upgrade exactly as we are, this isn't a requirement, just something I was exploring.
Back to the indexed view, I had a feeling my test was much too simple and the query plan was too simplistic. Perhaps I'll create a few indexed views and then check the DMV's to see if those indexes get used once we have a load on the system. Any little bit helps, and so long as the views don't include these commonly updated fields, it shouldn't add additional write overhead to have them in place. The rest of the data is very stagnant.
April 5, 2010 at 10:18 am
Seth, you're in a tough spot.
I think you know what makes sense, and perhaps the best idea is to go for things in a piecemeal fashion. Try to get some small wins, indexing, rewriting some code for approval, try to offload some reads elsewhere?
Disks still might be a good win, but definitely review indexing first.
April 5, 2010 at 11:35 am
As of right now, our entire DB is in one file/filegroup. Would splitting this into multiple files provide any benefit in this regard? They'd all still be on the same raid array. Probably an extremely silly question, but I have very little knowledge of that area.
April 5, 2010 at 11:42 am
I think from this entry (http://www.sqlskills.com/BLOGS/PAUL/post/Search-Engine-QA-12-Should-you-create-multiple-files-for-a-user-DB-on-a-multi-core-box.aspx) I'd say it might help. Paul mentions that SS2K8 has some parallel enhancements here.
Unless you can separate things out physically, however, I would not be sure it's a big gain. However it's easy, and if you can do it with some downtime, a small win might help.
April 5, 2010 at 11:51 am
Thanks again Steve. I think I might actually (for this one table at least) play with the idea of vertical partitioning and see how it works out. I've got a couple weeks to play with this still, so I might as well test it out.
April 5, 2010 at 11:57 am
Garadin (4/5/2010)
As of right now, our entire DB is in one file/filegroup. Would splitting this into multiple files provide any benefit in this regard? They'd all still be on the same raid array. Probably an extremely silly question, but I have very little knowledge of that area.
I think you may see some advantages. I moved a DB from 1 Filegroup to several filegroups. The files remained on the same RAID array and we were able to see an increase in performance. The increase was minimal but was present (1%).
Best results would be to do as Steve suggested - separate Arrays or LUNS. One database that I did this too was on a SAN and we just split the filegroups up to two luns and saw a 5% improvement.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
Viewing 15 posts - 1 through 15 (of 34 total)
You must be logged in to reply to this topic. Login to reply